Data Visualization

Cory Whitney

Data visualization: getting stuck

  • Open RStudio

  • type ? in R console with function, package or data name

  • Add R to a search with a copy of an error message

  • Help > Cheatsheets > Data Visualization with ggplot2

Data visualization: getting help

  • Many talented programmers
  • Some scan the web and answer issues

https://stackoverflow.com/

Getting your data in R

Load data

  • Load the data
participants_data <- read.csv("participants_data.csv")
  • Keep your data in the same folder structure as .RProj
  • at or below the level of .RProj

Creating a barplot in base R

R has several systems for making graphs

  • Base R
  • Create a barplot with the table() and barplot() functions
participants_barplot <- table(participants_data$academic_parents)

barplot(participants_barplot)

plot of chunk base_barplot Bar plot of number of observations of binary data related to academic parents

ggplot2: 'Grammar of Graphics' Overview

Many libraries and functions for graphs in R…

  • ggplot2 is one of the most elegant and most versatile.

  • ggplot implements the grammar of graphics to describe and build graphs.

  • Do more and do it faster by learning one system and applying it in many places.

  • Learn more about ggplot2 in “The Layered Grammar of Graphics”

http://vita.had.co.nz/papers/layered-grammar.pdf

ggplot2: names and email

Example from your data

library(ggplot2)
ggplot(data = participants_data, 
       aes(x = letters_in_first_name, 
           y = days_to_email_response)) + 
  geom_point()

plot of chunk ggplot_name_email Scatterplot of days to email response as a function of the letters in your first name

Want to understand how all the pieces fit together? See the R for Data Science book: http://r4ds.had.co.nz/

ggplot2: add color and size

ggplot(data = participants_data, 
       aes(x = letters_in_first_name, 
           y = days_to_email_response, 
           color = academic_parents, 
           size = working_hours_per_day)) + 
  geom_point()

plot of chunk ggplot_color_size Scatterplot of letters in your first name as a function of days to email response with colors representing binary data related to academic parents and working hours per day as bubble sizes.

Make more graphs

ggplot2: iris data

Example from Anderson's iris data set

ggplot(data = iris, 
       aes(x = Sepal.Length, 
           y = Petal.Length, 
           color = Species, 
           size = Petal.Width))+ 
  geom_point()

plot of chunk ggplot_iris Scatterplot of iris petal length as a function of sepal length with colors representing iris species and petal width as bubble sizes.

ggplot2: diamonds price

ggplot accepts formula arguments such as log

ggplot(data = diamonds,
       aes(x = log(carat),
           y = log(price),
           alpha = 0.2)) + 
  geom_point()

plot of chunk ggplot_carat_price

ggplot2: diamonds color shape

plot of chunk diamonds_color

ggplot2: set parameters

Set parameters manually with I() Inhibit Interpretation / Conversion of Objects plot of chunk unnamed-chunk-2

ggplot2: geom options

With “geom” different types of plots can be defined e.g. points, line, boxplot, path, smooth. These can also be combined.

plot of chunk unnamed-chunk-3

ggplot2: smooth function

geom_smooth() selects a smoothing method based on the data. Use method = to specify your preferred smoothing method.

plot of chunk ggplot_smooth ggplot2 lines and smoothing options

ggplot2: boxplots

  • Boxplots can be displayed through geom_boxplot().
# Create a boxplot where the x-axis is cut and
#  the y-axis is price divided by carat
ggplot(data = diamonds, 
       aes(x = cut, 
           y = price/carat)) + 
geom_boxplot()

plot of chunk ggplot_boxplot

ggplot2: jitter points

  • Jittered plots geom_jitter() show all points.
# Create a jittered boxplot where the x-axis is cut and
#  the y-axis is price divided by carat
ggplot(data = diamonds, 
       aes(x = color, 
           y = price/carat)) + 
geom_boxplot()+ 
geom_jitter()

plot of chunk jitter_plot

Your turn to perform

After you have gone through the tutorial please do the following exercises.

  • Create a scatter plot, barchart and boxplot (as above)
  • Vary the sample and run the same analysis and plots
  • Save your most interesting figure and share it with us